Compact Lexicon Selection with Spectral Methods
نویسندگان
چکیده
In this paper, we introduce the task of selecting compact lexicon from large, noisy gazetteers. This scenario arises often in practice, in particular spoken language understanding (SLU). We propose a simple and effective solution based on matrix decomposition techniques: canonical correlation analysis (CCA) and rank-revealing QR (RRQR) factorization. CCA is first used to derive low-dimensional gazetteer embeddings from domain-specific search logs. Then RRQR is used to find a subset of these embeddings whose span approximates the entire lexicon space. Experiments on slot tagging show that our method yields a small set of lexicon entities with average relative error reduction of > 50% over randomly selected lexicon.
منابع مشابه
Underspecified Phonological Features for Lexical Access
The FUL (featurally underspecified lexicon) system of automatic speech recognition is based on the representation of words in the lexicon with underspecified distinctive features. The speech signal is converted from the waveform into an online spectral representation made up of LPC formants and a few parameters describing the overall spectral shape. These spectral parameters are converted into ...
متن کاملThe Effect of Lexicon-based Debates on the Felicity of Lexical Equivalents in Translating Literary Texts by Iranian EFL Learners
This study was an attempt to investigate the effect of lexicon-based debates on the felicity of lexical equivalents in translating literary texts by Iranian EFL learners. To fulfill the purpose of this study, 59 university students, majoring in English Translation, were randomly assigned to the experimental and control groups from a total of 73 students based on their performance on a mock TOE...
متن کاملHigher Derivations Associated with the Cauchy-Jensen Type Mapping
Let H be an infinite--dimensional Hilbert space and K(H) be the set of all compact operators on H. We will adopt spectral theorem for compact self-adjoint operators, to investigate of higher derivation and higher Jordan derivation on K(H) associated with the following cauchy-Jencen type functional equation 2f(frac{T+S}{2}+R)=f(T)+f(S)+2f(R) for all T,S,Rin K(H).
متن کاملLearning Compact Lexicons for CCG Semantic Parsing
We present methods to control the lexicon size when learning a Combinatory Categorial Grammar semantic parser. Existing methods incrementally expand the lexicon by greedily adding entries, considering a single training datapoint at a time. We propose using corpus-level statistics for lexicon learning decisions. We introduce voting to globally consider adding entries to the lexicon, and pruning ...
متن کامل